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Abstract 


In this article, we propose a domain specific language, GUBS (Ge- 
nomic Unified Behaviour Specification), dedicated to the behavioural 
specification of synthetic biological devices, viewed as discrete open 
dynamical systems. GUBS is a rule-based declarative language. In 
contrast to a closed system, a program is always a partial descrip- 
tion of the behaviour of the system. The semantics of the language 
accounts the existence of some hidden non-specified actions that pos- 
sibly alter the behaviour of the programmed devices. The compilation 
framework follows a scheme similar to automated theorem proving, 
aiming at improving synthetic biological design safety. 
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1 Introduction 


Synthetic biology is an emerging scientific field combining the investiga- 
tive nature of biology with the constructive nature of engineering [30] to 
design synthetic biological systems. The issue is to devise new functional- 
ity/behaviour that does not exist in nature. Then, the field of synthetic 
biology is looking for principles and tools to make the biological devices 
inter-operable and programmable [27]. Synthetic biology projects were first 
focusing on the design and the improvement of small genetic devices com- 
parable to logical gates for electronic circuits [31, 12]. Recently, projects 
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have attempted to develop large bio-systems integrating different devices 
with as a long-term goal, the design of de-novo synthetic genome [23]. In 
this endeavour, computer-aided design (CAD) environments play a central 
role by providing the required features to engineer systems: specification, 
analysis, and tuning [5, 28, 38, 14]. Pioneer applications show the valuable 
potential of such environments in the IGEM competition. 


Currently, the design of synthetic genome specifies the structural as- 
sembly of DNA sequences (biobrick) as in GENOCAD [8]. Although this de- 
scription is indispensable to provide a finalized specification of devices, the 
abstraction level seems inappropriate for tackling large bio-systems. The 
required size of programs for sequence description likely makes the task 
error-prone and infeasible. In the same way as large software cannot be pro- 
grammed in binary, large biological systems cannot be described as a DNA 
sequence assembly. Then, scaling up the complexity of the synthetic biolog- 
ical systems needs to complete the structural description by an additional 
abstract programming layout based on a High-level programming languages 
and harness the automatic conversion of the design specification into a DNA 
sequence, like compilers. High-level programming languages for synthetic 
biology is announced as a key milestone for the second wave of synthetic 
biology to overcome the complexity of large synthetic system design [30]. 
Nonetheless, in this domain, language technology is still in its infancy and 
transforming this vision into reality remains a daunting challenge. 


Such high-level language should describe the devices in term of func- 
tionalities, offering the ability to program the behaviour directly instead 
of the structure supporting this behaviour. Indeed, behaviour specification 
contributes to accurately document the device by adding its behavioural 
description, to assess its functionality automatically and formally, notably 
by generating test-benches from this specification, and to get a relative in- 
dependence to technology because different biological structures can carry 
out the same functionality. In this framework, the components are selected 
and organized automatically or semi-automatically to generate a structural 
description of the device at compile phase whose behaviour complies with 
the specified function. One such approach has been already achieved in 
hardware by using languages as VHDL [1] or VERILOG [37] to overcome the 
growing complexity of electronic circuits. However, the major difference 
in synthetic biology relates to the openness of biological system. Hence, 
we propose to define a language dedicated to synthetic biology based on a 
behavioural specification that handles the openness of system. 


More precisely, GUBS is a rule-based declarative language dedicated 
to the behavioural specification of discrete open dynamical systems for syn- 
thetic biology interacting with its environment. GUBS symbolically defines 
the behaviours to provide a relative independence from structures by post- 
poning the biological component selection at compile phase. Within this 
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framework, the compiler translates the behavioural specification to a struc- 
tural description of a device whose behaviour carries the functional features 
defined by a program. The proposed compilation method is inspired by 
automated theorem proving. 

After introducing related works(Section 2) on languages dedicated to 
systems and synthetic biology, we introduce the main features of GUBS 
language (Section 3), we define the semantics of GUBS based on hybrid 
logic. Then, we detail the proof-based principles governing the compilation 
(Section 5) illustrated with a complete example (Section 6). 


2 Related Work 


Several domain specific languages have been developed to model and simu- 
late biological systems. Based on process calculus, seminally used to model 
process concurrency, several rule-based languages model protein interac- 
tions [29, 16, 11]. Another approach is based on logic, such as BIOCHAM [9] 
that formalizes the temporal properties of a biological system. As these 
languages are dedicated to simulation, the objective is to close the systems 
because the simulations need to integrate all the characteristics of the anal- 
ysed systems. By comparison, the purpose of GUBS is different since the 
issue is to represent the behaviour of a synthetic device in an organism, 
leading to translate the notion of the openness of biological systems by the 
semantics of the language. 

In synthetic biology, structural description languages [14, 28, 5] allow 
to specify well-formed genome sequences by grammars modularly and hier- 
archically. Although the sequence description is necessary, the programmer 
must previously anticipate the behaviour of the device to conceive. Besides, 
the behavioural design is not included in the program while it initially mo- 
tivates it. In GUBS, the design is driven by a behaviour description and 
sequence selection is postponed at compile phase. Moreover, the size of the 
structural description is also subject to a combinatorial explosion when the 
complexity of programmed systems increases. 

Amorphous programming languages has been also investigated to spec- 
ify the biological devices at the scale of cell colonies, here considered as 
a possible computing medium for amorphous program. J. Beal [4] demon- 
strates a proof of concept of this approach in PROTO, showing the feasibility 
of an automatic compile chain. In GuBs, the compile chain is based on 
rewriting rules whose correctness have been formally proved with regards 
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to a semantics describing the constraints of an open system. 

Developing a language for biological systems actually involves to con- 
sider several unknowns due to their openness: lack of knowledge on all the 
interactions in biological circuits and imprecise definition of initial condi- 
tions. We only know the result of a chain of effects. Then, the major 
constraint for programming open systems seems to be: how to provide an 
expressive language to describe the dynamics of such systems, but simple 
enough to capture the essence of the biological questions in a small program 
in order to allow programming of large biological systems with a program 
humanly achievable. 

In the future, the design in synthetic biology will certainly require dif- 
ferent programming layouts based on different paradigms addressing the 
integration levels of biological systems. In a tower of languages, starting 
from a language with collective operations on cell colonies, using an amor- 
phous programming language as PROTO [4] or a language for dynamical 
systems with dynamical structures as MGS [22], and ending by a structural 
description programmed in a grammar based language, the GUBS language 
occupies the intermediary level dedicated to cell entity behavioural program- 
ming. 


3 GUBS Language 


In this section, we describe the main features of GUBS. Informally, a GUBS 
program describes the expected observed behaviour of a biological compo- 
nent. A sequence of observation must comply to a sequence of events related 
by a causal chain specified in a GUBS program. 


Agents, attributes and states. The agents represent the biological ob- 
jects. Hence their different observable states characterize their different 
behaviours. The behaviours define the different capacities for actions on 
the state of the other agents. It is worth noting that they are characterized 
symbolically by a set of attributes identifying these different capacities. The 
real significance of the attributes is a matter of convention depending on 
the targeted realization (e.g., protein pathways, gene network) and will be 
addressed through examples. For instance, the regulatory activity of a gene 
is observationally related to thresholds of RNA transcripts concentration. 
At a given threshold, a gene regulates a given set of genes whereas at an- 
other one the regulation applies to another set of genes (See Figure 1). The 
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different thresholds define the levels of gene activities leading to different 
regulatory activities. For example, if we identify three different kinds of 
regulatory activities for a gene G, the state of the gene will be defined by 
three different attributes {Zow, Mid, High} characterizing three possible be- 
haviours symbolically. For example, G( Low) expresses the fact that agent 
G is in state Low and then ready for the action corresponding to this at- 
tribute. In some cases, a single state is sufficient to qualify the capacity for 
the action of the agent. Hence, the agent is identified to its capacity. Then, 
G means that agent G is available. 

By contrast, G(Zow) signifies that the state of the agent differs from 
Low (G when an agent has a single capacity). It is worth to point out that, 
not being in a state defined by an attribute, does not necessarily mean that 
the agent state is in another attribute. Indeed, for open systems the state 
of the agents could be of any sort that does not necessarily belong to the 
pre-defined attributes. 

Two kinds of relations on attributes are defined: an order, <, mean- 
ing “less capacity than” and an inequality, #, meaning “different capac- 
ity than”. Then Low < Mid implies that the capacity for the action of 
Mid includes the capacity related to Low. Usually, in a gene regulatory 
model [17], the set of genes regulated at a given level will also be regulated 
at a higher level. By contrast, in signalling pathways, the phosphorylation 
of a protein induces a conformational change of the structure leading to a 
specific signalling potentiality not occurring in the unphosphosrylated con- 
formation. Assuming that Phos and UnPhos respectively represents the 
phosphorylated and the unphosphorylated conformations of protein P, we 
have Phos # UnPhos. Then, P(Phos) implies P( UnPhos) implicitly. The 
attributes and the relation between attributes will be declared as follows: 
G =: {Low < Mid, Mid < High}, P : {Phos # UnPhos}. A set of attributes 
replaces the relations if unknown and no specific relation is set between 
attributes. 

Finally, the description of the agent state is extended to a collection of 
agent states as follows: gj +...+gn, meaning that all the agent states, g;, 
are observed simultaneously. 


Constant and variables. In GUBS, two kinds of agents are distinguished: 
the constants and the variables. The constants designate the real pre-defined 
objects in a corpus of knowledge. In biology, the constants may refer to 
proteins or genes of interest. For example, the agent LacZ refers to LacZ 
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protein or gene. By convention, their name starts with a capital letter. 
The variables refer to an abstraction of these pre-defined objects and can 
be potentially replaced (substituted) by any constant. By convention, the 
variable names start with a lower-case letter. 


Trace, event, and history. A GUBS program describes a behaviour, its 
interpretation is based on the observations of designed systems. Then, the 
issue is to formalize the notion of behaviour observation. To this end, we 
focus on the notion of a trace that symbolically represents the evolution of 
some quantities related to the agents of interest by the evolution of these 
agent states. A trace can be obtained from experiments by establishing a 
correspondence between measurements of some quantities (e.g., RNA tran- 
script concentration) and attributes of agents. Formally, a trace, (T:)i<t<m, 
is a finite sequence of agent state sets where each set contains all observed 
agent states at a given instant. For instance, the evolution of a concentration 
evolving from Low to High for G may be described by the following trace 
of 6 instants?: ({G(Low)}.{G(Low)}.{G(Mid)} {G(Mid)},{G(Mid)} {G(High)}), 
However, all the events in a trace are not necessarily relevant with regard to 
the behaviour description. For example, if we focus on the evolution from 
Low to High for G, we decide arbitrarily that only three events are relevant 
for the behaviour description: G(Low) then G(Mid) and finally G( High); 
without accounting the intermediary evolution stages occurring between. 
Then, the behaviour recognition always emphasizes the key events in a trace 
entailing its contraction to show their succession. Such a contracted series 
is called a consistent history of the expected behaviour. Generally speak- 
ing, an history is related to a chronological division of a trace into periods 
where the events of a period represent all the agent states occurring at each 
instant. Then, an history is a sequence of these event sets. Given a trace 
(T:)1<t¢m, and a chronological division, (d;)i<jen such that d; < di41, corre- 
sponding to a sequence of the starting dates for each period, the history is a 
sequence of agent states occurring in each period, (H;)1<j<n, such that each 
Hi, = Ua;<t<a,;,, 74. Hence, a consistent history is purposely made to point 
the characteristic event steps of a behaviour description out. 

In the previous example, a chronological division of the trace lead- 
ing to an history consistent with the expected evolution from Low to High 
for G is (1,3,6,7) which corresponds to following discrete time-intervals 


Step 7 is inserted as an extra step to comply with the definition of the chronological 
division. 
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({1,2],[3,5],[6,6]). The resulting history is: ({G(Low)},{G(Mid)},{G(High)}). 
Notice that (1,2,4,7) also fits. However, the chronological division (1, 3,7) 
leads to an inconsistent history because the level Mid and High are not 
explicitly distinguished as too separate steps. Hence the history does not 
follow the expected progress from low to high. The formal definition of the 
consistency in the scope of the semantics will be given in Section 4. 


Behavioural dependence and observation spot. A behavioural de- 
pendence identifies a relation between behaviours as a causal relation on 
events. Basically, the dependences should define the control of agents on 
each other. However, the definition of the causality also needs to tackle 
the openness of a system by adapting it to this context. An historical def- 
inition of the causality, proposed by Hume [24], is formulated in terms of 
regularity on events: “[we may define] a cause to be an object, followed by 
another, and where all the objects similar to the first are followed by objects 
similar to the second”. Although this definition appropriately characterizes 
the notion of control, the openness of the system implies to account for the 
environment actions that possibly alter the causal dependence chain. For 
example, a programmed activation G =» G2 may be contradicted by an 
existing inhibition G3; —> G» addressing the same target gene Gy. Hence, 
while G, is active, it may appear that G2 will not be active because the 
regulatory strength of G3 is greater than the regulatory strength of G,, con- 
tradicting the expected activation by a hidden inhibition. Hence, pushed to 
the limit, this consideration prevents the ability to describe any behaviour 
causally because any programmed action can be unexpectedly preempted 
by an external one. Adapting the Hume’s definition, we define a causality 
by the occur of its effect. If the effect is observed, the causal relation is 
effective. which is different from the basic approach considering : if the 
cause is observed, the causal relation is effective. 

By ensuring that the design describes a new functionality which is 
not observed naturally, the observation of effects becomes the sole events 
indicating the trigger of causal dependences. Indeed, the observation of 
a cause cannot be considered as an indicator because its action could be 
preempted by external events. In other words, the proposed definition of 
the causality reflects the fact that the device may be not functional due 
to an external intervention. However, the functionality is still correctly 
specified because this eventually is accounted in the definition of causality 
validated by the observation of effects. Besides, as no cause external to 
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the description is assumed to trigger the effects of dependences for the new 
functionality, the over-determination by unknown causes is supposed to be 
prevented, then ensuring that the program is the sole device entailing the 
expected effects in the biological system. Hence, the definition of the causal 
dependence will be governed by the effect leading to the following definition 
of the dependence: “if effect e would occur then c occurs”. Moreover, the 
scope of future (resp. past) is narrowed to a closest future (resp. past) 
period, representing the fact that a response is always expected in a given 
delay. Notice that, the proposed definition circumvents the afore mentioned 
problem illustrated by the hidden inhibition because if the effect does not 
occur the question of the existence of a cause is meaningless. This definition 
is somehow equivalent to the causal claims proposed by Lewis [26] in terms 
of counter-factual conditionals, z.e., “If c had not occurred, e would not 
have occurred”. 

Three behavioural dependences are defined in GUBS: the normal de- 
noted by O-, persistent by O—, and residual by ®>. These dependences are 
primitive in the sense that they cannot be expressed by the others without 
weakening there properties (see Table 5). Informally, for normal dependence 
the cause precedes the effect providing the effect is observed; for persistent 
dependence the cause still precedes the effect but it is maintained while 
the effect is observed; and for residual dependence, the effect is maintained 
despite the cause has disappeared. These dependences symbolize common 
biological interactions. For instance, in genetic engineering, a recombina- 
tion enables the emergence of a regulated gene or an hereditary trait per- 
manently. Such a mechanism typifies the residual dependence in biology. 
The relations between gene expressions at steady state are symbolized by 
persistent dependence. The behavioural dependences are defined as follows 
(see Section 4 for their formalization): 


cO->e: if e occurs then c occurred in the closest past. 


c ©> e: if e occurs then c occurred in the closest past and also cur- 
rently. 


c ®@—> e: if e occurs then, either e occurred in the closest past or e does 
not occurs in the closest past and c necessary occurs. 


Figure 1 exemplifies the correspondence between experimental traces, 
symbolic traces and the history for the causal dependences. All the de- 
pendences are extended to a set of causes and a set of consequences, 7.€., 
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Figure 1: The curves represent the typical behaviours of the causal depen- 
dences based on the time evolution of a quantity (q) related to agents c and 
e (e.g., RNA transcript for gene regulation). The symbolic agent states c 
and e are here both associated to the maximal threshold of the quantity. 
The symbolic trace (T) is issued from a periodic sampling of the evolution 
by identifying whether c or e occur. A consistent history (H) complying 
to a causal dependence definition is represented below the trace. The first 
graphic illustrates the normal causality: c O> e, the second the persistent: 
c ©> e and the third the residual one: c => e. 
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Cy t...+ Cy O> €1 +...+€m. For example, let us define the activation and 
the inhibition as follows: 


+ a = - = = 
91 — 92 = 91 OF 92,9; OF Go and gi — 92 = G1 O> 92,91 OF GJg. Then, 
the program depicting a negative regulatory circuit with two genes, i.e., 


=F = ‘i _ = — 
91 — 92,92 — gi, is: {91 O> 92,91 OF Go, G2 O> 91,92 OF G1}. 


The observation spots describe the set of observations expected in 
a trace. For instance, observing that gene G is at level high is written 
Obs::G( High). As the activation of a dependence lies on the observation of 
the effect, the observation spot is used to determine which effects must be 
necessarily observed. To some extends, observation spots can be assimilated 
to experimental requirements. For example, in the negative regulatory cir- 
cuit, the characteristic observation spots are: 0bs1::(g1 + Jz), 0b82::(g1 + 92). 


Compartment & Context. A compartment encloses a set of depen- 
dences making them local to the compartment. For instance, C{g; O> g2} 
describes a normal dependence occurring in compartment C’. The compart- 
ments are hierarchically organized and all the compartments are included 
in another except for the outermost one. Although the compartments di- 
rectly refer to the compartmentalized cellular organization (e.g., nucleus, 
mitochondria), they are also used to emphasize the isolation of some inter- 
actions by syntactically enclosing the dependences into a compartment. Cs 
refers to an agent state in compartment C’. 


A context refers to a stimulus acting on the system, as environmental 
conditions or external signalling. The application of a context c to a set of 
dependences 0 is written [c]b where c is either a variable or a constant. This 
means that dependences of b are triggered when the context c is present. For 
instance, recently Ye et al. [39] explore the opto-genetics signalling to control 
the expression of target transgenes. The blue-light induces the expression 
of transgene (tg) via a signalling cascade leading to the binding of NFAT 
transcription factor to a specific promoter (PNFAT). The following program 
using a context summarizes the process: [BlueLight]{NFAT ©- tg}. A con- 
text can be decomposed to several contexts, [k1,...,k,]b, meaning that all 
the conditions must be met to trigger the dependences of b. The interpre- 
tation is equivalent to a context cascading, [k1|[k2]...[kn]b. Moreover, the 
observation spots and the attribute definition are context insensitive. 
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4 Semantics of GUBS 


The interpretation of GUBS program is a formula of multi-modal hybrid 
logic with the “Always” operator, H(A,@). Formally, a GUBS program 
defines a set of causal relations and observation spots. Notice that those 
sets can be empty. The program is translated into an hybrid logic formula. 


Hybrid logic. In what follows, we recall the formal syntax and semantics 
of hybrid logic. The hybrid logic [6, 7] offers the possibility to denominate 
worlds by new symbols called nominals. They will be used in satisfaction 
of modal operators @,; the formula @,¢ asserts that @ is satisfied at the 
unique point named by the nominal a identifying a particular truth value of 
a formula at this point. Given a set of propositional symbols, PROP, a set 
of relational symbols REL, and a set of nominals NOM disjoint to PROP, a 
set of well formed formula in the signature of (PROP, NOM, REL) is defined 
as follows: 


o2=T|plal-6|dA¢6| Gad] (k)b|(k) o| Ad. 


with p¢ PROP,a¢ NOM and ke REL. Moreover, the syntax is extended to 
other logical operators ?: 1,v,>,[k],E, in the usual way. 

The semantics of H(A,@) is based on the Kripke model satisfaction 
(Table 1). M,w Ir ¢ is interpreted as the satisfaction of a formula ¢ by a 
model M at world w where |r stands for the realizability relation (i.e., “is a 
model of”). A model validates a formula, denoted by M |r ¢, if and only if 
it is satisfied for all the worlds of the model (7.e., Vw¢ Dom M: M,w it ¢). 


Definition 1 (Kripke model). A Kripke model is defined as a structure : 
M = (W, (Rx )kers V) 


where W = Dom M is a non-empty set of worlds, 7 © REL a subset of rela- 
tional symbols denoting the modalities (i.e., label of edges), Ry © WxW,ker 
a relation of accessibility, V : (PROPU NOM) + 2” an interpretation at- 
tributing to each nominal and propositional variable a set of worlds such 
that any nominal addresses at most one world(i.e., Va ¢ NOM: |V(a)| < 1). 
By convention, R stands for the union of the accessibility relation, R = 


(ger Rx). 
8 1 = aT, WV b= (=~ A=), 0 > $ = (0-9), [k]b = =(k) 4, Bd = A-=¢. 
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The modal theory of a model M with respect to a set of formulas F’,, 
THr(M), is the set of formulas of F’ validated by M, i.e., THr(M) = 
{p¢ F | M it o}. KS(@) denotes the set of all models validating @, i.e., 
KS(¢) = {M|M Ir 9}. 


M,w it T iff true 

M,w ita iff we V(a), ae NOMU PROP 

M,w lt ad iff M,wit @ 

M,wlk 61 A dbo iff M,w Ik d; and M,w IF de 

M,wit @.d iff Jw'e W:M,w' ir and {w’} = V(a) 
M,wit({k)d iff we W: M,w' i+ band wRzw’ 
M,wit (k)>d iff dw’ e W:M,w'’ it d and w’ Rew 
M,witAd iff Vw'eW:M,w' iF & 


Table 1: Hybrid logic interpretation. 


Semantics. A GUBS program is interpreted by an hybrid logic formula. 
Hence, it is considered as observable if and only if its corresponding formula 
is valid. The validity/satisfiability is defined from a Kripke model (Defini- 
tion 1) gathering different possible histories. A world in a Kripke model 
represents an event defined by a set of agent states at a given point in the 
history 1. 

Operator [| ] means “observed in all possible closest futures” and ( ) 
means “observed in a possible closest future at least” (resp. { ),[ |” for the 
closest past). Besides, accessibility relations, (Rz)x<,, represents a “tempo- 
ral evolution” in regard to some contexts. Thus, they are indexed by the 
non-empty parts of the set of all contexts of a program P, denoted by Kp 
(i.e, 7 = 2P \ {@}). A non-empty set of contexts @ c K ¢ Kp, is then 
a modality, i.e., (K),[ kK] with ( ) = (@) by convention. Agent states are 
variables of the formulas and observation spots are interpreted by nominals 
used to identify worlds. 

Let (W,e,A) be the set of worlds W with the concatenation operation 
and the neutral element, the empty world A and Fy, the set of well-formed 
formulas of H(A,@), the denotational semantics is defined by four func- 
tions: [.]:P > Fy, [-[p:P ~W > 2” > Fy, .],:B>W->Fxu,[-]p:R- 
W — Fy, where P,B,R respectively stand for the set of GUBS programs, the 
set of agent state sets and the set of relations on attributes. |.] is the main 
function initiating the interpretation. |.]p provides an interpretation for be- 
haviours: causal relation, compartment, context and observation spot. [.], 
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{5}] = A ([b]p (A)(2)) 
e]p(C)(K) = 
bi, b2] p (C)(K) = [bi] p (C)(K) 4 [be] p (C)(K) 
$1 O> s2]p(C)(K) = [s2] 2 (C) > (K) ([s1] (C)) 
81 O> 82] p(C)(K) = [sa] 2 (C) > (Tig (C) 4(K) (L511 (C))) 
$1 ®> s2]p(C)(K) = [s2]p (C) > (( Y [s2}e (C)) v (KY [si] (C))) 
Js 9n? {71,5 Pm} p (C)(K) = Aka AF [rade (C-9) 
lis] p (C)(K) = @ [s]z (C) 
C'{b}]p (C)(K) = [b]p(C.C")(K) 
[K}{o}]p (C)(K') = [b]p(C)(KUK’) 
81 +...+8n],_(C) = Nir [si], (C) 
C’.s],(C) =[s],(C.C’) 
9) 15 (C) =C.Ga 
9(@)]z (C) =7C.Ja 
Ip (C) =Cg 
Ip (C) =-C.g 
a1 < a2]; (9) = Jaz > Jar 
a1 # a2) p (9) = Ja; > Jaz Jaz + ~Jay 
al, (9) = 7. 


Table 2: Semantics of GuBS. In the definition, a represents an attribute, ba 
behaviour, g an agent, s a set of agent states or an agent state, r a relation on 
attributes, Ca compartment, K a set of contexts and } a set of behaviours 
(i.e., contexts, compartments, dependences, attributes, observation spots). 


defines the interpretation of agent and agent set. Finally, [.], corresponds 
to the interpretation of attribute relation. Table 2 defines these functions. 
Table 3 describes two interpretations of GUBS program : the first program 
is a negative cycle with two genes, the other one is a part of band detector 
pattern used in Section 6. 


The observability is based on the interpretation of a program translat- 
ing it to an hybrid logic formula. An observable program corresponds to a 
valid formula. Hence we use tableau method for hybrid logic which is proved 
decidable for hybrid logic fragments without the binder. For this fragment, 
tableau method is proved exp-time with a logarithm bottom floor[15]. Ac- 
cording to the semantics (Table 2), the resulting formulas are in conjunctive 
form with at most 3 disjunctive clauses for persistent causes. Each applica- 
tion of the disjunction rule will create a new branch in the tree formed by 
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GUBS program Hybrid logic interpretation 
A( 
91 O> g2 ; g2 > (( gi) Ag) A 
G1 O> Go ‘ ag2 > (( )-791) A 
92 O> Gy m > (CU )7g2) A792) 4 
9207 91 ; —91 > (( ) 92) A 
obs, 91+9G ,; Qoos,(g1 A 792) A 
obs 3 Ch + g2 Qos. (791 A g2) 
} ) 
GUBS program Hybrid logic interpretation 
{ Al 
AHL:{low # mid # high} : AHL_high > AHL_mid A 
AHL mid > AHL_low A 
[Light]{detect 0+ AHL(low)} —, AHL_low > (({(Light) detect) A 
[Light]{detect o> AHL(mid)}  , AHL_mid > (((Light) detect) A 
[Light]{detect O> AHL(high)}  , AHL_high > (((Light) detect) a 
i ) 


Table 3: Interpretation of GUBS program into hybrid logic. 


the tableau resolution, so the complexity resulting of those formulas will be 
in O(3") where n is the number of lines in the normalized program. 


Consistent history. Now, we formally define the consistency of the his- 
tory with regards to models. 


Definition 2 (Consistent history). Let P,(M) be the set of path ending with 
an observable spot on the last world for a model M, a consistent history, 
My, with regards to a program P is defined as follows: 


1. My is a model. 
2. Pun(Ma) = {My}. 
3 My [P] 


Remark 1. Note that, if P,(M) is the set of all paths in M such that the 
final point is named, VM yz ¢ P(M), Mit [P] = My iF [ PI. 


Notice that, if a program is validated by a model, all the histories are 
validated. 
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5 Compilation 


At compile time, a program is transformed to a structure (e.g., a DNA se- 
quence) while inserted in a vector cell (such as a bacteria), that should 
behave according to the programmed specification. The structure will re- 
sult in an assembly of several devices stored in a library of components 
(e.g., parts registry). As the design relates here to a behavioural/functional 
description, we need to bridge the gap between structural and functional de- 
scription. This stage is called the functional synthesis. The issue is to select 
a set of components whose assembly preserves the behaviour of the program. 
To achieve this goal, a GUBS program is associated to each component to 
describe its behaviour. Thereby, the component assembly corresponds to a 
program assembly preserving the behaviour of the compiled program. Pre- 
serving a behaviour is captured by a property called the behavioural in- 
clusion formalizing the fact that the characteristic observational traits of 
the specified function must be recognized in traces related to the device 
experiments. In other words, we can construct histories consistent with the 
programmed behaviour from histories consistent with the device behaviour 
description. The behavioral inclusion is defined from the interpretation of 
the programs, as a logical consequence (Definition 3). 


Definition 3 (Behavioral inclusion). A program Q behaviourally includes 
another program P, if and only if the interpretation of the latter is a logical 
consequence of the interpretation of the former: 


PEQ#VM:Mt-[Q] — Mt [P]. 


The behavioral inclusion is a pre-order* such that the empty program, 
denoted by ¢, is a minimum; meaning that a program with no expected 
behaviour can be observed in all traces. A program whose interpretation 
equals 1, is a maximum. Figure 2 illustrates the behavioural inclusion on a 
particular model P. 


Observability. It may arise that no history will be consistent with a pro- 
grammed behaviour. For example, the program {Obs :: g,g O> g} is not 
observable in a trace. Indeed, its interpretation yields the following formula: 
A((Q@oosg) A (g > (({ )>3g) A-=g))), false in all models because world Obs 
must both satisfies g and 7g by definition of the persistent dependence. A 


4 ; ies . 
A reflexive and transitive relation. 
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' 


{g1 O> gs, {[ki]{go0 O- gi}, 
[ks]{g3 O> ga}, | [ke] {go O> ge}, 


ks] 

; k 
[ka]{g3 O> 95}, | 91 O> gs, @y . 
[ka]{as O- a0), | go O- as, @) @) 
[k7]{gs O> gio}, | [ks]{g3 O> ga}, 
g9 O> 911, [ka]{g3 O> gs}, (92) ke ker 
gio O> g11, Jo O> gs, Go on (ox) 


a ga} fis] (98 0 97}, @ 
[ke]{gs O- go}, rN 

[k7]{gs O> gio}, k3 kag 
g9 O> gii; 


gio O> gi1, a @ b tp) c :(912) 


a: ga,b:g5,c: g11} 


Figure 2: Behavioural inclusion example. Consistent histories of P necessary 
contains worlds coloured in grey. From the a,b,c observation spots, the 
model corresponding to worlds in grey validate the original model. Hence, 
the behaviour of P is included in the model of Q represented by the entire 
graph. 


GUBS program is said to be observable if and only if the formula resulting 
from its interpretation is validated by one model at least. Hence, the inter- 
pretation of an unobservable program is a contradiction. An unobservable 
program can be assimilated to a programming error. The detection of such 
errors can be carried out at compile time using tableaux method [10] that 
automatically determines whether a formula is satisfiable in a model. Indeed 
GuBs uses a fragment of H(A,@) named HL(@) logic which is decidable. 
The observation of the behaviour is essential to validate a program to en- 
sure its safety. Hence, the assembled components must be always observable 
because a program behaviourally included in an observable program is also 
observable (Proposition 1). 


Proposition 1. A program behaviourally included in an observable program, 
obs P, is observable: VP,Q € P:(obsQ) \(PEQ) = = obsP. 
5.1 Functional Synthesis 


Functional synthesis is the operation whereby biological components of a 
library are selected and assembled to generate a device behaviourally includ- 
ing the designed function. The behaviour of each component is described 
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by a GUBS program. At its simplest, the functional synthesis could be con- 
sidered as a proper substitution of variables by constants. For example, in 
the following activation {Gj 5 g2}, g2 will be substituted by gene G2, pro- 
viding that component Q describes the activation {G = G2}. However, 
more complex situations may arise during component selection. For exam- 
ple, if the activation G; —+ Gp» occurs with another regulation only .e., 
Q={Gi ae Go, G3 ty G4} then the selection of Q adds a supplementary 
regulation. 

Formally, a finite substitution is a set of mappings, o = {v; & b;};, on 
variables and constants such that a variable can be substituted by a variable 
or a constant, and a constant can only substituted by itself?. For instance, 
we have: {Obs::G(l) + 62,61 O> G(l)}[{b1 » Bi,b2 % Bo,l & Low}] = 
{ Obs::G(Low) + Bo, B, o> G(Low)}. 


Functional synthesis rules. Functional synthesis is defined by rules (Ta- 
ble 4) governing the component assembly. Only the dependences and the 
attributes will be functionally synthesized. The observation spots are con- 
sidered as annotations used for the compilation process. To ensure the 
correctness, each transform must preserved the original behaviour. Hence, 
each program resulting from the application of a rule must behaviourally 
include the previous one. Formally, the functional synthesis is modelled 
by a relation on programs denoted by ¢, i.e., Q —, P where P is the ini- 
tial program and Q the transformed one, such that each rule insures that: 
Q &., P is correct in regard to a substitution o, that is Plo] E Q[o] and 
Q|[c] is observable. Also notice that the behavioural inclusion is preserved 
by substitution (Proposition 2). 


Proposition 2. For all substitutions 0, we have: PEQ = > P{o] Ee Q[o]. 


Table 4 describes the functional synthesis rules®. T is a set of components 
representing the library. P Cagsm Q denotes the fact that program @ corre- 
sponds to an assembly including P i.e., Q = (Qi, P, Q2) where Q, or Q2 may 
be an empty program. Rule (Inst.) describes the fact that an observable 
instance of a part of a component in the library is functionally synthesized. 


5 Po or P[o] represents its application on program P and identity substitutions are 
omitted. 


hypothesi 
®Rules are of the form: Bi eee . 
conclusion 
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- INSTANTIATION - 
Q[o] Sasm Plo] obs (Q[c]) QeT 
(Inst.) 
Qe. P 
- COMMUTATIVITY, CONTRACTION - 
Qc. P,P’ Qe. P 
a Com. on Cont. 
Qe, P',P ( ) Q&> P,P ( ) 
- ASSEMBLY - 
Qe. P Q! Ga P' olva(P)aVA(P’) = O' lVA(P)AVA(P’) obs (Q[a], Q’[o’]) (Asn) 


Q,Q' —ouo! P, P’ 
Table 4: Functional synthesis rules 


Rule (Com.) expresses the commutativity of the assembly. Rule (Cont.) 
contracts the redundant formulation of programs. Finally, Rule (Asm.) de- 
tails the conditions for an assembly of two components, each representing a 
functional synthesis of a part of the designed function. A detailed example 
of their use on a real case is given in Section 6. 


Theorem 1. The functional synthesis rules (Table 4) are correct. 


Another set of rules, more specifically devoted to dependences (Table 5), 
defines the alternate possibilities to express similar behaviours. The table 
also includes the rules for agent sets. Rule (Trans.) expands the chain of the 
persistent dependences (S; ©> S3) by adding intermediary dependence (52) 
to refine a pathway. Rule (N2P.) weaken a normal dependence (5; O> S2) to 


- DEPENDENCES - 
Q &os S11 OF S2, S52 O07 S3,A Q &o S1 © S2,A 
Qe, S1 © S3,A Qe, S51 OF S2,A 
Q ented Sy O> So,A 
2N. 
=: Suexsee 
- AGENT STATES - 


Sts Sts 
Giese (SCont.) ae (Incl.) 


(Trans. ) (N2P.) 


Si + S2 


Table 5: Rules for the dependences and the agent states. 5; stands for a 
collection, s1+...+8,,, of agent states, including negation, and A stands for 
the rest of the program. 
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a persistent one (S$; ©> S2) since the latter is a normal dependence with an 
additional property. And Rule (R2N.) weaken a residual dependence(S1 ®> 
S2) to a normal dependence (S; O> S2), since normal dependence is also 
residual dependence with a repetition of the effect restricted to one step. 
According to these rules, all the dependence chains can be implemented 
with persistent dependences. Final rules are devoted to agent states. Rule 
(SCom.) and (SCont.) describe the propriety of + operator which is a logical 
A. Finally, (Incl.) specify that a behaviour can be extended with another 
unless the original one still present. 


Theorem 2. Dependences rules are correct according to the model specifica- 
tion, and agent states rules are correct according to logic operators (Table 5). 


A possible algorithm for the assembly could be based on a combinato- 
rial application of the rules. However, such algorithm may reveal inefficient 
in practice. The conditions for an efficient algorithm of compilation should 
be based on an internal representation of a program, as a set of contextu- 
alized dependences with attributes, {{A,[K]S; ®> S2}},where ®> stands 
for any kind of dependence, such that A, K,5j,S2 are respectively: a set 
of attributes specification related to the agent involved in the dependency, 
a set of contexts and sets of agent states. Any program can be encoded 
under this representation from a normal form of the program (not detailed 
here). Accordingly, the problem solved by the compilation algorithm can 
be defined as follows (Definition 4): 


Definition 4 (Functional Synthesis Problem (FSP)). Let [ = {Qi}i<ien 
be set where each Q; is a set of contextualized dependences with attributes 
and P a set of contextualized dependences with attributes, can we find the 
smallest observable subset of components C © T, such that there exists a 


substitution 0 so that its application on the components of C' form a cover 
of Plo],i.e., do: Pla] © Uag;ec Qjl7] 4 obs C. 


As the set cover problem is reducible to this problem, the problem is 
NP-complete (Proposition 3). 


Proposition 3. The Functional Synthesis Problem is NP-Complete. 


5.2 Compilation Steps 


In this section, we detail the main steps of the compilation performing the 
functional synthesis. The result of the compilation is composed of a com- 
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ponent list and a substitution list attributing a constant to each variable of 
the compiled program. The resolution is oriented towards a heuristic algo- 
rithm aiming at finding a minimal set of components covering the behavior 
of a program. Besides, extension capabilities are also considered for facil- 
itating further software developments. Mainly, these developments would 
improve the component selection, notably by integrating biological compat- 
ibility between components without being necessary mentioned explicitly in 
the program in order to ease the programmer task. Hence, the design of 
the compiler must take the both requirements in consideration: the func- 
tional synthesis and the software sustainability. These requirements orient 
the development towards the use of a meta optimization heuristic, and more 
precisely an evolutionary algorithm providing a suitable framework for the 
resolution of the functional synthesis while facilitating the further develop- 
ments.. 

Evolutionary algorithm [13, 21] is a class of meta heuristic optimiza- 
tion algorithms inspired by Darwinian evolution principles mimicking the 
biological evolution process: evolutionary algorithm selects candidate solu- 
tions and stochastically makes them evolve by recombination and mutation 
leading to improve their quality quantified by a fitness function. Gener- 
ally speaking, evolutionary algorithm solves a multi-objective optimization 
problem specified as follows [40]: 


minimize F(x) = (fi(@),..., fn(xv)) such that xe X, 


where X is a set of viable solutions/individuals chosen in a domain X’ and 
validated by a predicate p (i.e., X = {x € X'| p(x)}) and F is a sequence of 
objective/fitness functions, fi: X > R. 

Accordingly, the application of evolutionary algorithm requires to spec- 
ify the three elements (the encoding of individual x € X, the viability con- 
straint p and the fitness functions f;) in accordance with the concerned 
problem, related here to FSP. 


Individual. An individual stands for a proposal for solving FSP. It rep- 
resents a subset of components C = {Q;}; chosen in database [. Then, as 
individuals correspond to finite subsets of a reference set (database), they 
are implemented by boolean vectors of size |[| such that 1 identifies the 
selected elements and 0 for the others. 


Fitness functions. The fitness functions guide the selection of viable in- 
dividuals to improve the synthesis quality. By definition of FSP, the number 
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of components (i.e., the number of elements equal to 1 in a vector) is neces- 
sary a fitness function since we aim at minimizing it. However, other fitness 
functions may be added for a better component selection guidance, notably 
by accounting biological aspects. As evolutionary algorithm deals with mul- 
tiple objectives, their addition is technically easy. The focus is then rather 
puts on the ability to properly model biological constraints quantitatively. 
This challenging problem is not studied in the article but considered as a 
working perspective. 


Viability constraints. The viability constraints are related to the observ- 
ability of an individual on one hand, and the ability to determine whether an 
individual behaviorally includes the program to compile on the other hand. 
A possible approach to verify the observability can be achieved by trans- 
lating the program into formula and then by applying a tableau method 
to verify the satisfiability of the resulting formula. However, the exponen- 
tial complexity of the algorithm would make its use impractical for some 
cases. To circumvent this potential problem, we orient the validation of 
the observability to another method, called the strong observability (Obs), 
based on the syntax of the program determining a necessary condition for 
the observability (7.e., Obs(P) = > obs(P)). Basically, a program is not 
observable if the formula describing its semantics is an unsatisfiable formula 
such as a variable and its negation. Hence, no Kripke model validates such 
formula. In the context of GUBS, a such situation comes from the simul- 
taneous occurrence of incompatible agent states. An incompatible pair of 
agent states corresponds to: an agent state and its negation (e.g., 9,9), 
agent states with mutually excluded attributes (e.g., g( Phos), g( UnPhos) 
with Phos # UnPhos ), or an agent state expressed by an attribute and the 
negation of another agent state by an attribute with less capacity than the 
first one (e.g., g( High), g(Low) with Low < High). An incompatible pair of 
agent states arises in the following cases: either an incompatible pair occur 
in left or right side of a dependence, if there exists an incompatible pair 
in the agent states of a chain of persistent dependences since the cause al- 
ways occur with the effect by definition of the persistence. Otherwise, the 
program is observable. 


Therefore, the strong observability consists of checking whether these 
cases appear in the program that can be achieved in polynomial time from 
its text and the definition of attributes by analyzing: the agent states of 
the dependences and the agent states for pair of persistent dependences in 
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a chain of persistent dependences. 


The behavioral inclusion implies to “match” each causal dependence 
of the program with a causal dependence of an individual while respecting 
the nature and the structure of the dependences. The matching is a pure 
syntactic process proceeding on text of programs that is assimilated to an 
unification of terms. For FSP, the unification algorithm is applied on asso- 
ciative commutative and idempotent function (ACI-unification). Indeed, in 
Table 4 and 5, Rules (Com.), (SCom.), and Rules (Cont.), (SCont.) identify 
the respective role of the commutativity and the idempotency in the synthe- 
sis whereas Rule (Inst.) (Table 4) characterizes the outcome of unification. 
ACT-unification [2] solves equation of terms using associative, commutative 
and idempotent operators by determining a list of substitution (unifier). In 
our context, the objective is to find a substitution o and a part Q of an 
individual such that Q = Po where P is the program to compile. Thus, a 
program is viewed as a term representing an union of causal dependences set 
such that a causal relation stands for a symbolic (non commutative) binary 
function applied on two sets containing variables and constants as param- 
eters. ACI-unification problem is NP-Complete [25, 2]. However, in litera- 
ture, efficient heuristics covering the different variations of ACI-unification 
problem such as set unification have been proposed [35, 18, 36] and can be 
adapted to our case. 


Figure 3 describes the compilation process of a GUBS program. Addi- 
tionally, several CAD environments for synthetic biology design use meta 
optimization algorithms in a similar way than genetic programming for the 
combinatorial logic function generation where the design of logical circuits 
complying to an expected input/output profile specifying their behavior, 
is automatically achieved by an evolutionary algorithm. Pioneer work on 
synthetic biology has been undertaken in [19] for the design of a bistable 
oscillatory circuit, by using a genetic algorithm. In [34], the authors ap- 
ply a Monte-Carlo algorithm to synthesize a de-novo ribosome binding site. 
In [33, 32], the authors propose an evolutionary algorithm based environ- 
ment to automatically produce small regulatory networks from a library of 
biological components that match with a behavior represented by an evo- 
lution profile of RNA concentration for some target genes. These works 
evidence the applicability of evolutionary algorithm for automatic design 
in synthetic biology. Although the use of evolutionary algorithm in GUBS 
differs, it supports the same objective and opens up the possibility of an 
integration based on the same optimization framework. 
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Figure 3: Overview of compilation process of GUBS. 


6 Example 


The application of rules underlying the compilation process is here described 
in a real case for the design of the Band Detector proposed in [3]. This 
example explains how from a simple abstract definition of the functionality 
a complex design can be synthesized. Accordingly, GUBS may be used to 
describe a behaviour with a high-level of abstraction as well as a low-level, 
detailing the components involved in the design. We introduce each step 
of the different transforms from the high-level program to the low-level one 
in the example. Each is ruled by the application of rewriting rules defined 
in Tables 4, 5, ensuring its correctness and so, its functional safety in the 
context of open system. The design of the example aims at forming patterns 
of different colours in a population of bacteria exploiting the quorum sensing 
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Figure 4: The band detector regulatory circuit. 


phenomenon by staining with fluorescent protein (GFP). The amount of 
molecules of interest a cell receives depends on its relative position to the 
cell diffusing the molecule of interest controlled by an external event: the 
more the cell is far from the source, the fewer is the amount of molecules 
received. The activation or inhibition of the fluorescent protein due to the 
concentration will distinguish the bands surrounding the source. In the 
original design, the protein does not fluoresce in an intermediary band. 
From a computing standpoint, we can assimilate the design to a mes- 
sage transmission coupled to a sensor/actuator responsible for fluorescence, 
then leading to a concise GUBS program presented below: the diffusive 
molecule is AHL which production is controlled by a context and the obser- 
vation is applied on GF'P. Two categories of cells are defined: the Sender 
and the Receiver. Therefore, two GUBS programs identify the two cell types. 


Sender = { AHL:{low # mid # high}, 
[Light]{detect 0+ AHL(low), detect O+ AHL(mid), detect O+ AHL(high)}} 
Receiver= { AHL(low) O+ GFP, AHL(mid) O> GFP, AHL(high) O> GFP, 0bs1::GFP, obs2::GFP } 


Figure 4 describes the original genetic circuit used in the article. The 
diffusible molecule is the constant AHL. detect is a variable used to represent 
the initial action of Light activating AHL diffusion. The gene LurR has 
three activation thresholds: at Level 2, it activates both LaclM1 and Cl, at 
level 1, the amount of AHL only allows activation of Cl, and finally, at level 
0, none are activated. 

To ease compilation follow-up, we label each dependency of the sender- 
receiver program (Table 6). We show that from the sender-receiver program, 
we obtain the original design by applying the aforementioned rules with 
an appropriate selection of components (Table 7). The regulations of Fig- 
ure 4 are described in a GUBS program translating in terms of dependences 
and relations on their attributes their regulatory action. We focus here on 
some illustrative steps of the sender program compilation. The complete 
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functional synthesis is given next. The compilation consists in finding the 
appropriate components whose assembly behaviourally includes the sender- 
receiver program, with the particularity that the diffusive molecule must be 
the same in both programs. 


Sender Receiver 


Py. = {[Light]{detect 0+ AHL(low)}} | Po, = {AHL(low) O> GFP} 
Py2 = {[Light]{detect 0+ AHL(mid)}} | Pog = {AHL(mid) O> GFP} 
Pig = {[Light] {detect O> AHL(high)}} | Po3 = {AHL(high) O> GFP} 


with {AHL:{low # mid # high}} as attributes of AHL. 


Table 6: Separation of the dependences. 


Qi= {[Light]{detect o> Tetr}} 

Qo= {Tetr —> Luxl} 

{AHL:{low # mid # high}, Lux! —> AHL(low), Luxl —> AHL(mid), Luxl —» AHL(high)} 
{AHL: {low # mid # high}, LuxR:{low # {mid < high}}, 

AHL(mid) O> LuxR(mid), AHL(high) O> LuxR(high)} 

Qs= {LuxR:{low # {mid < high}}, LuxR(mid) —> Cl, LuxR(high) —> Cl +LaclM1} 

Qo= {Cl—> Lacl} 

Qr= {LaclM1 —> GFP} 

Qs= {Lacl —> GFP} 


Table 7: Part of the database dedicated to the Band Detector. 


In the sequel, P;; refers to 7 ‘e normalized causal relation of the program 
P; where P,; is the Sender and P> is the Receiver. Let us consider P,; whose 
compilation is close to Pig and Pi3. Notice that Pi, cannot be directly 
instantiated with any component because, on the one hand, the component 
Q1 contains a context like Pj; but applied on gene Tetr instead of AHL, 
and on the other hand Q3 has the AHL molecule but no context is defined. 
So, to match P,, with the components Q1, Q2 and Qs, first, the normal 
dependence is converted to persistent one (Rule (N2P.)). 


Q1, Q2,Q3 —o {[light] {detect 0+ AHL(low)}} 


(N2P.) 
Q1,Q2,Q3 —o Pi 


Thereby, the resulting dependence can also be separated to match the as- 
sembly Q1,Q2,Q3 by applying (Trans.) rule twice. v; and v2 are fresh 
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variables. 


Q1,Q2,Q3 oe Pj, = {[light] {detect ©> v2, v2 O> v1,01 O> AHL(low)} 
Q1,Q2, Q3 Go [light] {detect O> v1,v1 O> AHL (low) } 
Q1,Q2,Q3 Go [light]{detect 0+ AHL(low)} 


(Trans.) 


(Trans.) 


Finally, we obtain a new program program P/, compatible with Q1, Q2,Q3, 
and each variable is substituted by a constant (biological element) with the 
application of Rule (Inst.). For Pj, we have: 


Q1,Q2, Q3[¢ = {light/Light, v1/Tetr, v2/Lual}] Casm Piy[o] obs(Q1, Q2, Q3[c]) 
Q1,Q2,Q3 Go [light]{detect O> v1, v1 O> v2,v2 O> AHL (low) } 


(Inst. ) 


By following this scheme for Piz and Pi3, we respectively obtain Pj, and 
Pi3. The final assembly corresponds to the functional synthesis of Sender 
program. 


Q1,Q2,Q3 Go Ply Q1; 92,93 —o Pig Q1; 92,93 —o Piz 


Q1,Q2,Q3 —o Pi Q1,Q2,Q3 >" Piz Q1, Q2,Q3 —on Pis3 
Q1,92,Q3 —oue'ue” Pi, Piz, Pis 


(Asm. ) 


In conclusion, the functional synthesis generates the original genetic circuit 
(Figure 4) from the sender program. A similar approach can be also applied 
to obtain the receiver program (see the complete proof below 6). 


Sender = {AHL:{low # mid # high}, [Light]{detect O> Tetr}, 
Tetr + Lux, Luxl —+ AHL(low), Lux! —> AHL(mid), Luxl —> AHL(high)} 


Complete Compilation of the Band Detector 


This section details step by step the application of the rules to perform the 
functional synthesis of the Band Detector example Tables 8,9 and 10. 
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- SENDER - 
Q1,Q2, Q3[0 = {detect/Detect, light/Light,v1/Tetr, vo/Luzl}] Casm Pile] o0bs(Q1, Q2, Q3[a]) a ) 
nst. 

Q1,Q2,93 oe Pi} 

Q1,Q2,Q3[0’ = {detect/Detect, light/Light, v3/Tetr, v4/Lual}] Sasm Piy[o’] obs(Q1, Q2, Q3[0’]) a ) 

nst. 

Q1,Q2,Q3 gr Pie 

Q1,Q2,Q3[o” = {detect/Detect, light/Light, v5/Tetr, v6/Lurl}] Sasm Pi3[o"] obs(Q1, Q2, Q3[0”]) 


(Inst.) 
Q1,Q2,Q3 E gn Piz 
Pri = [light]{detect @> v1, v1 O> v2, v2 O+ AHL(low)} 
[light]{detect O> v1,v1 O+ AHL(low)} 
[light]{detect 0+ AHL(low)} 
Pit 


(Trans.) 


(Trans.) 
(N2P.) 


Ply = [light] {detect O+ v3, v3 > v4, v4 O+ AHL(mid)} 


(Trans.) 
[light]{detect @> v3, v3 @+ AHL(mid)} 


[light]{detect 0+ AHL(mid)} 
Pi2 


(Trans.) 
(N2P.) 


Pig = [light] {detect O> v5, v5 O> vg, vg OF AHL(high)} 
[light]{detect @> v5, v5 O+ AHL(high)} 
[light]{detect 0+ AHL(high)} 
Piz 
Q1, 92,93 -o Pit Q1,Q2,Q3 — gr Pi2 Q1;,Q2,Q3 — gn Pi3 
Q1,92,903 —gustuo Pi1, Piz, Pis 


(Trans. ) 


(Trans. ) 
(N2P.) 


(Asm.) 


1. Firstly, we split the sender program in three sub programs P11, Pra, 
and P3, each corresponding to a causal relation. 


2. Initially, Pi,, Pig and Pig don’t match with any component of the 
database, so we extend them ( Pi), Pj, and Pj3) to find a matching. 


3. Finally, we can match Pi), Pj, and Piz with components Q1, Q2, Q3. 


Table 8: Sender compilation. 
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- RECEIVER - 
Q4,Q5,Q6, Qs[o = {v1/LueR, v2/Cl, v3/Lacl}] Sasm P31[o] obs(Q4, Q5, Q6, Qs[c]) (ie. <8 
nst. 
Qa, Q5,6;,Q8 —o Phy 
Qa, Q5,Q6, slo’ = {va/LueR, v5/Cl, vg/Lacl}] Sasm P32[o'] obs(Q4, Q5; Q6, Q8[o’]) Baden 44 
nst. 
Q4, 25,926,085 Pao 
Q4,95,Q7[0" = {v7/LuaR, vg/LacM1}] asm Paglo"] obs(Q4, Q5,Q7[0]) (Inst.) 3 
nst. 
Q4,95,Q7 -% Pos 
P53, = AHL(low) © v1, v1 © v2, v2 © v3, 03 O+ GFP 
(Trans. ) 
AHL(low) @> v1,v1 @> v2, v2 O+ GFP 
(Trans. ) 
AHL(low) ©> v1,v1 0+ GFP 2 
(Trans. ) 
AHL(low) @+ GFP 
(N2P.) 
Poi 
Pbo = AHL(mid) ©> v4, v4 O> U5,U5 © Vg, U6 OF GFP 
Trans. 
AHL(mid) ©> v4, v4 © U5, U5 OF GFP ( ) 
(Trans. ) 
AHL(mid) ©> v4, v4 O> GFP 2 
(Trans. ) 
AHL(mid) 0+ GFP 
(N2P.) 
P22 
P33 = AHL(high) © v7, v7 @ vg, vg O GFP 
- (Trans.) 
AH L(high) @> ; @- GFP 
(hea) eine (Trans.) 2 
AHL (high) 0+ GFP 
(N2P.) 
Pog 
Q4, 95,26; 98 —o Por Q4; 95,26; Q8 —gr Poe Q4,95,Q7 on P23 
(Asm.) 1 


Q4, 95,96; Q7, 98 —guslUc! P21; P22, Pos 
U U 


1. As for the sender program, the receiver program is split in three sub- 
programs P 1, P22 and P23, each corresponding to a cause differing by 
their AHL concentration. 


2. P21, Pe and P3 initially do not match with any component in the 
database. So, we extend them (P},, Pj, and P33) by applying exten- 
sion rule (Ext.). 


3. P3, and Pj, describe the same behaviour for two different AHL con- 
centrations. Thus, their respective variable (v1 and v7) is substituted 
by the same constant LuxR. Pj. describes the presence of GFP match- 
ing with the components {Q4, Q5,Q6,Qg}. Finally, P},, Pj. and P33 
match with components {Q4, Q5, Q6, Q7, Qs}. 


Table 9: Receiver compilation. 
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- FINAL DESIGN - 
Sender 
{AHL: {low # mid # high}, [Light ]{detect O> Tetr}, 

Tetr —> LuxL, Lux! + AHL(low), 
Luxl + AHL(mid), Luxl + AHL(high)} 

Receiver 
{AHL: {low # mid # high}, LuxR:{low # {mid < high}}, 
AHL(mid) O> LuxR(mid), AHL(high) O> LuxR(high), 
LuxR(mid) —> Cl, LuxR(high) —> LaclM1, 
Cl —> Lacl, LaclIM1 —> GFP, Lacl —> GFP} 


Table 10: Complete band detector compilation. 


7 Conclusion 


In GUBS language, we propose to characterize a programming paradigm 
abstracting the molecular interactions in the context of open system, that 
differs to an approach dedicated to biological system modelling. Accordingly, 
the interactions are symbolized by causal dependences whose interpretation 
is driven by effect. We have demonstrated the proof-of-concept of the com- 
pilation based on rewriting rules, and illustrated it on a realistic example. 
A perspective of this work is to improve the component selection by identi- 
fying the biological parameters and define the appropriate fitness function 
for a selection also accounting quantitative biological constraints. 
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Appendix 
program := {behaviour} 
behaviour == behaviour, behaviour | behaviour 
behaviour — == compartment | dependence | context | observation | defattributes 
compartment ::= varconstant {behaviour} 
observation == varconstant::words 
context == [varconstants] {behaviour} 
dependence ::= words O> words | words ©> words | words ®> words 
word == attribute | varconstant(attribute) | varconstant.word 
words == words + word | word 
attribute == varconstant | varconstant 
defattribute == varconstants : attspec 
attspec := defspec{varconstants} | {attrels} 
defspec == exclusion | inclusion 
attrels == attrels, attrel | attrel 
attrel == varconstant < varconstant | varconstant # varconstant | varconstant 
varconstant == word| Word 


varconstants == varconstants, varconstant | varconstant 


Table 11: Syntax of GUBS program 


Proofs 


Proposition 1. By contradiction, assume that P is unobservable, then there 
does not exist a model satisfying the formula. As @ is observable, we deduce 
that there exists models satisfying Q, but no restricted model must satisfy 
P, that contradicts the definition of the behavioural consequence. 


Proposition 4. Let w ¢ Fy be a formula, let o : (NOMvu PROPu REL) > 
(NOMUPROPUREL) be a substitution on nominals, variables and relational 
symbols, let M = (W,(Rx)ker,V) be a model, we define the model M = 
(W, (Rx) kez,V) from M as follows: 

1. Vae NOMu PROP, VweW:weV(ac) —> we V(a) 


2. VkeF: wRygu’ —> wR,w’; 


we have: M,wit- vo —> M,wikw. 
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Proof. The proof is defined by induction on the formula: 

without loss of generality, we assume that w is in Negation Normal 
Form where negation occurs only immediately before variables only. Recall 
that every formula can be set in Negation Normal Form. 


M,wita <=> M,w lk a,a € PROPUNOM. By (1), we have w € 
V(ac) <— > we V(a) leading to the equivalence. 


M,w ik -a => M,w it sa. By definition of the realizability rela- 
tion, this is equivalent to: M,w ita <— > M,w it a. By (1), this 
equivalence holds. 


M,w it (di Adee —> M,w ir (v1 Ave). By definition of the 
substitution, we have to prove: M,w Ik (Wi) A (Yao) —> M,wit 
(v1 A w2). By definition of the realizability relation we can formulate 
the property equivalently as follows: 


M,w tt (io) \M,w ik (oo) > M,wik dp AM, w ik Wo. 


By induction hypothesis, we have: M,w it (Wic) > Mw ik 1 
and M,w It (Wc) << > M,w It We, implying the previous condition. 


M,wit Wivi2j — M,w it (v1 V Y2). The proof is similar to 
the proof of the previous item (A). 


M,w it (Qa~)o —> M,w it Q@qw. By definition of the substitution 
we have to prove that: M,w Ik (Qazvc0) <=> M,w IF Q@aw By 
definition of the realizability relation, this is equivalent to: 


qw’eW:weV(ac)AM,w' to => Aw" W:w"€ V(a)onM,w" 


bw. 


By setting w' = w’, from (1) we have: w’ « V(ao) <= > w' € V(a). 
By induction hypothesis, we have: M,w’ Ik yo <=> M,w’ Ik ~. The 
both last properties imply that: 


dw’ W:weV(ac)AM,w' too — 4u’eW:w' €V(a)oaM,w" FY, 
which implies the initial property. 


M,wit ((k)b)o <> M,w tt (k)d. By definition of the substitution 
we prove that: M,w It (ka)vo <— > M,witt (k)w. 
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By definition of the realizability relation the condition is equivalent 
to: 


dw’ eW:M,w' tedonwRrgw! <=> Jw" eW: Mw" bE bawR,w”. 


By setting w’ = w", the following equivalence holds from (2): wRrow" <=> 
wR,w’. By induction hypothesis, we have: M,w! Ik po <=> M,w’ IF 
w. The both last properties imply that: 


Jw’ eW: Mw HK doAwRyw’ <= M,w' KvawR,w' 


which implies the initial property. 
M,w It ([k]v)o — > M,w it [k]b. The proof is similar to the 


previous item. 


Mit (Ed)o = Mt Ey. By definition of the substitution we 
prove that: M,w Ik E(wo) <> M,w It Ew. 


By definition of the realizability relation, we have: 


JweW:M,wit (Wo) << M, wit w, 


which is directly verified by induction hypothesis. 


M ik (Ay)o <=> M i+ Ay. The proof is similar to the previous 
item. 


Proposition 2. First, let us remark that when P & Q, the property is trivially 
verified. Besides, under the assumption P E Q, if Q[o] is not observable 
the property is also verified because an unobservable program includes all 
programs behaviourally (Definition 3). 

In the rest of the proof, we assume that P is behaviourally included in 
Q and Q[o] is observable (7.e., PE Q and obsQ[o]). Hence, by definition 
of the observability there exists a model M such that M | [Q[o]]. By 
proposition 4, we deduce that there exists a model M such that: M IF [Q]. 
Moreover, as P & Q by hypothesis, there exists S$ ¢ Dom M such that: 
M g'+ [P]. By construction of M we deduce that there exists a sub model of 
M, denoted by M’, complying to the properties, (1) and (2) of Proposition 4 
which corresponds to M g- Moreover, we have M’ i+ P[a] by Proposition 4. 
Then we conclude that: Plo] & Q[o]. 
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Proposition 3. By reduction to the minimum covering problem (SP5 in [20]). 
The problem is in NP. Assume we have a substitution o and Q = {Q;}:; 
a set of components, checking whether P[o] is included in Ug,ec Qj[¢] is 
performed in polynomial time. 

The problem is NP-complete. The reduction is performed on minimum cov- 
ering problem based on an encoding of elements by dependences. 

Instance: Collection X of subsets of a finite set S', and a positive integer 
k<|Cl. 

Question: Does X contain a cover for S of size k or less,i.e., a subset X’ ¢ X 
with |X’| < k such that every element of S belongs to at least one member 
of X'? 


Reduction. Each element A € S is assimilated to a constant and translated 
into a dependence A O> A. Therefore, the substitution is trivially the 
identity. The database is X(7.e., X =I). Finally, the result of the functional 
synthesis is X’ (i.e., X’=C). 


Theorem 1. First, let us remark that P E Q is true whenever M | Q by def- 
inition of the behavioural inclusion (Definition 3). Hence, the proof doesn’t 
consider the trivial verified case but rather the case where M It Q. 


Inst. Directly from the definition of the behavioural inclusion (Definition 3). 


Com. By definition of the semantics [P, P’] = A(¢A ¢’) = A(d’ A ) = 
[P’,P] with [P]p = ¢ and [P’]p = ¢’. Thus, for all M we have: 
Mit [P,P’] — Mt [P’,P]. Hence, if Q & P,P’ we conclude 
that: Qe P',P. 


Cont. Similar to the proof of (Com.). 


Asm. First let us remark that o|ya(pyava(p’) = o lvA(P)ynva(p’) means that 
the substitution of the common variables are the same for o and o’, 
leading to, Q[o Uo'] = Q[a] and Q'[oU0'] = Q"[o']. Let o” = aud’. 
Then, we have the following property by definition of the semantics 
(Table 2) and o”. 


WM € KS([(Q,Q’)[o"]]) Mir [Q[o]] 4 Mit [Q'To']]. 


Notice that the set of models, KS([(Q, Q’)[o’]]), is not empty since, 
by hypothesis, 
obs (Q[o], Q’[o’]) holds. As Q © P and Q! ©, P’, any model 
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validating Q (resp. Q’) also validates P, (resp. P’) by definition of 
the functional synthesis. Then, we deduce that: 


VM € KS([(Q,Q’)[o"]]) : Mit [Plo]] 4M itt [P’[o’]]. 
Then, we conclude that: 


VM € KS([(Q,Q’)[o"]]): Mit [(P Po]. 
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